Goto

Collaborating Authors

 global maximizer


Predictive Entropy Search for Efficient Global Optimization of Black-box Functions

Neural Information Processing Systems

We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore, PES can easily perform a fully Bayesian treatment of the model hy-perparameters while ES cannot. We evaluate PES in both synthetic and real-world applications, including optimization problems in machine learning, finance, biotechnology, and robotics. We show that the increased accuracy of PES leads to significant gains in optimization performance.


Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions

Amar Shah, Zoubin Ghahramani

Neural Information Processing Systems

We develop parallel predictive entropy search (PPES), a novel algorithm for Bayesian optimization of expensive black-box objective functions. At each iteration, PPES aims to select a batch of points which will maximize the information gain about the global maximizer of the objective. Well known strategies exist for suggesting a single evaluation point based on previous observations, while far fewer are known for selecting batches of points to evaluate in parallel. The few batch selection schemes that have been studied all resort to greedy methods to compute an optimal batch. To the best of our knowledge, PPES is the first non-greedy batch Bayesian optimization strategy. We demonstrate the benefit of this approach in optimization performance on both synthetic and real world applications, including problems in machine learning, rocket science and robotics.


Predictive Entropy Search for Efficient Global Optimization of Black-box Functions

Neural Information Processing Systems

We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives such as Entropy Search (ES). Furthermore, PES can easily perform a fully Bayesian treatment of the model hyperparameters while ES cannot. We evaluate PES in both synthetic and realworld applications, including optimization problems in machine learning, finance, biotechnology, and robotics. We show that the increased accuracy of PES leads to significant gains in optimization performance.



Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions

Neural Information Processing Systems

We develop parallel predictive entropy search (PPES), a novel algorithm for Bayesian optimization of expensive black-box objective functions. At each iteration, PPES aims to select a batch of points which will maximize the information gain about the global maximizer of the objective. Well known strategies exist for suggesting a single evaluation point based on previous observations, while far fewer are known for selecting batches of points to evaluate in parallel. The few batch selection schemes that have been studied all resort to greedy methods to compute an optimal batch. To the best of our knowledge, PPES is the first nongreedy batch Bayesian optimization strategy. We demonstrate the benefit of this approach in optimization performance on both synthetic and real world applications, including problems in machine learning, rocket science and robotics.


Deception by Omission: Using Adversarial Missingness to Poison Causal Structure Learning

Koyuncu, Deniz, Gittens, Alex, Yener, Bülent, Yung, Moti

arXiv.org Artificial Intelligence

Inference of causal structures from observational data is a key component of causal machine learning; in practice, this data may be incompletely observed. Prior work has demonstrated that adversarial perturbations of completely observed training data may be used to force the learning of inaccurate causal structural models (SCMs). However, when the data can be audited for correctness (e.g., it is crytographically signed by its source), this adversarial mechanism is invalidated. This work introduces a novel attack methodology wherein the adversary deceptively omits a portion of the true training data to bias the learned causal structures in a desired manner. Theoretically sound attack mechanisms are derived for the case of arbitrary SCMs, and a sample-efficient learning-based heuristic is given for Gaussian SCMs. Experimental validation of these approaches on real and synthetic data sets demonstrates the effectiveness of adversarial missingness attacks at deceiving popular causal structure learning algorithms.


Data-Adaptive Discriminative Feature Localization with Statistically Guaranteed Interpretation

Dai, Ben, Shen, Xiaotong, Chen, Lin Yee, Li, Chunlin, Pan, Wei

arXiv.org Artificial Intelligence

In explainable artificial intelligence, discriminative feature localization is critical to reveal a blackbox model's decision-making process from raw data to prediction. In this article, we use two real datasets, the MNIST handwritten digits and MIT-BIH Electrocardiogram (ECG) signals, to motivate key characteristics of discriminative features, namely adaptiveness, predictive importance and effectiveness. Then, we develop a localization framework based on adversarial attacks to effectively localize discriminative features. In contrast to existing heuristic methods, we also provide a statistically guaranteed interpretability of the localized features by measuring a generalized partial $R^2$. We apply the proposed method to the MNIST dataset and the MIT-BIH dataset with a convolutional auto-encoder. In the first, the compact image regions localized by the proposed method are visually appealing. Similarly, in the second, the identified ECG features are biologically plausible and consistent with cardiac electrophysiological principles while locating subtle anomalies in a QRS complex that may not be discernible by the naked eye. Overall, the proposed method compares favorably with state-of-the-art competitors. Accompanying this paper is a Python library dnn-locate (https://dnn-locate.readthedocs.io/en/latest/) that implements the proposed approach.


Global Optimization Networks

Zhao, Sen, Louidor, Erez, Mangylov, Olexander, Gupta, Maya

arXiv.org Machine Learning

We consider the problem of estimating a good maximizer of a black-box function given noisy examples. To solve such problems, we propose to fit a new type of function which we call a global optimization network (GON), defined as any composition of an invertible function and a unimodal function, whose unique global maximizer can be inferred in $\mathcal{O}(D)$ time. In this paper, we show how to construct invertible and unimodal functions by using linear inequality constraints on lattice models. We also extend to \emph{conditional} GONs that find a global maximizer conditioned on specified inputs of other dimensions. Experiments show the GON maximizers are statistically significantly better predictions than those produced by convex fits, GPR, or DNNs, and are more reasonable predictions for real-world problems.


Orthogonal Group Synchronization with Incomplete Measurements: Error Bounds and Linear Convergence of the Generalized Power Method

Zhu, Linglingzhi, Wang, Jinxin, So, Anthony Man-Cho

arXiv.org Machine Learning

Group synchronization refers to estimating a collection of group elements from the noisy pairwise measurements. Such a nonconvex problem has received much attention from numerous scientific fields including computer vision, robotics, and cryo-electron microscopy. In this paper, we focus on the orthogonal group synchronization problem with general additive noise models under incomplete measurements, which is much more general than the commonly considered setting of complete measurements. Characterizations of the orthogonal group synchronization problem are given from perspectives of optimality conditions as well as fixed points of the projected gradient ascent method which is also known as the generalized power method (GPM). It is well worth noting that these results still hold even without generative models. In the meantime, we derive the local error bound property for the orthogonal group synchronization problem which is useful for the convergence rate analysis of different algorithms and can be of independent interest. Finally, we prove the linear convergence result of the GPM to a global maximizer under a general additive noise model based on the established local error bound property. Our theoretical convergence result holds under several deterministic conditions which can cover certain cases with adversarial noise, and as an example we specialize it to the setting of the Erd\"os-R\'enyi measurement graph and Gaussian noise.


SDP Achieves Exact Minimax Optimality in Phase Synchronization

Gao, Chao, Zhang, Anderson Y.

arXiv.org Machine Learning

We study the phase synchronization problem with noisy measurements $Y=z^*z^{*H}+\sigma W\in\mathbb{C}^{n\times n}$, where $z^*$ is an $n$-dimensional complex unit-modulus vector and $W$ is a complex-valued Gaussian random matrix. It is assumed that each entry $Y_{jk}$ is observed with probability $p$. We prove that an SDP relaxation of the MLE achieves the error bound $(1+o(1))\frac{\sigma^2}{2np}$ under a normalized squared $\ell_2$ loss. This result matches the minimax lower bound of the problem, and even the leading constant is sharp. The analysis of the SDP is based on an equivalent non-convex programming whose solution can be characterized as a fixed point of the generalized power iteration lifted to a higher dimensional space. This viewpoint unifies the proofs of the statistical optimality of three different methods: MLE, SDP, and generalized power method. The technique is also applied to the analysis of the SDP for $\mathbb{Z}_2$ synchronization, and we achieve the minimax optimal error $\exp\left(-(1-o(1))\frac{np}{2\sigma^2}\right)$ with a sharp constant in the exponent.